Chromosome-level genome assembly of the flower thrips Frankliniella intonsa

As an economically important insect pest, the flower thrips Frankliniella intonsa (Trybom) causes great damage to host plants by directly feeding and indirectly transmitting various pathogenic viruses. The lack of a well-assembled genomic resource has hindered our understanding of the genetic basis and evolution of F. intonsa. In this study, we used Oxford Nanopore Technology (ONT) long reads and High-through chromosome conformation capture (Hi-C) linked reads to construct a high-quality reference genome assembly of F. intonsa, with a total size of 225.5 Mb and a contig N50 of 3.37 Mb. By performing the Hi-C analysis, we anchored 91.68% of the contigs into 15 pseudochromosomes. Genomic annotation uncovered 17,581 protein-coding genes and identified 20.09% of the sequences as repeat elements. BUSCO analysis estimated over 98% of genome completeness. Our study is at the first time to report the chromosome-scale genome for the species of the genus Frankliniella. It provides a valuable genomic resource for further biological research and pest management of the thrips.


Background & Summary
The flower thrips, Frankliniella intonsa (Trybom) (Thysanoptera: Thripidae), is a small-sized insect pest, well known for feeding and dwelling on the flower of host plants.It is widely distributed in the world including Europe, Asia, Oceania, and North America 1,2 , and becoming the dominant thrips species in several areas of China 3,4 .By rapid development and both sexual and parthenogenetic reproductive modes, F. intonsa is able to cause severe damage to various commercial crops, such as cowpea, eggplant [5][6][7][8][9] , and rapidly accumulated resistance to insecticides like spinosad 10 .In addition to direct damage to the leaves, flowers and fruits, F. intonsa is also capable of transmitting a variety of plant pathogenic viruses, such as Tomato spotted wilt orthotospovirus (TSWV) and Chrysanthemum stem necrosis virus (CSNV) to host plants 11,12 , resulting in destructive damage and huge economic losses every year.Interestingly, we found that the endosymbiont Wolbachia is dwelling in F. intonsa, but is absent in the sibling species F. occidentalis (unpublished data).High-quality genomic resources are urgently needed to elucidate the key genetic mechanisms of flower thrips like virus transmission, pheromone biosynthesis, insecticide resistance and bacterial mutualism.To date, despite the large number of species in the family Thripidae (true thrips), only five species have publicly available genomes, including a scaffold-level genome assembly of western flower thrips (F.occidentalis, 415.8 Mb) 13 , a chromosome-level genome assembly of melon thrips (Thrips palmi) 14 , a scaffold-level genome assembly of tobacco thrips (F.fusca, 370 Mb) 15 , and two chromosome-level genome assembly of bean blossom thrips (Megalurothrips usitatus) reported by Ma et al. 16 (238.14Mb) and our group 17 (247.82Mb), respectively.However, there is no reported genome assembly for F. intonsa, nor chromosome-level genome for the Frankliniella genus species.In this study, we report a high-quality chromosome-level genome assembly of F. intonsa using an integrated sequencing strategy including ONT, Illumina, and Hi-C.Our research provides valuable resources for studying the evolutionary genetics and molecular basis of F. intonsa.
To obtain a high-quality genome assembly of F. intonsa, a total of 31.63Gb ONT long reads (~124-fold coverage) and 15.71 Gb NGS short reads (~62-fold coverage, 2 × 150 bp) were generated.Using the integrated sequencing data, we obtained a contig-level genome with a total size of 225.5 Mb, consisting of 405 contigs with N50 length of 3.37 Mb and N90 length of 161 Kb. (Table 1).The total length of the genome assembly is similar to the estimated genome size (approximately 234.5 Mb) based on 21-mer depth analysis (Fig. 1a).The total GC content is 52.73%, which is comparable to the other Thripidae species [13][14][15][16][17] .To improve the continuity of the genome assembly, we exploited 41.97 Gb (~165-fold coverage) Hi-C data, which generated about 57 million Hi-C contacts, to concatenate the contigs.Approximately 91.68% of the contig sequence was successfully anchored into 15 pseudochromosomes ranging from 9.78 Mb to 20.82 Mb (Fig. 1b,c).We further performed BUSCO analysis to assess the completeness of the genome assembly based on four categories of datasets, including Eukaryota, Metazoa, Arthropoda and Insecta (odb_10).As a result, 96.9% conserved Eukaryotic genes and more than 98% of the core genes in the other three datasets were identified, strongly suggesting a high level of completeness of the F. intonsa genome assembly (Fig. 1d).
Using multiple repeat annotation software, we constructed a repeat library containing 1,347 repeat consensus sequences in the F. intonsa genome.We then performed a genome-wide scan of repeat-associated regions based on the repeat library.As a result, we annotated approximately 20.09% repeat regions in the F. intonsa genome (Table 1).Remarkably, the repeat content in the F. intonsa genome is significantly higher than in the other published genomes of Thripidae species (F.occidentalis (9.86%), M. usitatus (15.05%), T. palmi (6.45%)), suggesting the amplification of repeat elements in the F. intonsa genome [13][14][15][16][17] .
A combined approach of ab initio prediction, homolog-based prediction, and transcript-based prediction was used to predict gene structure in the F. intonsa genome.This resulted in a total of 17,581 protein-coding genes, which is comparable to other Thripidae species.BUSCO analysis using the gene model showed that 89.5%, 93.8%, 94.4% and 94.0% of the core genes from the Eukaryota, Metazoa, Arthropoda and Insecta datasets, respectively, were complete, indicating a high level of completeness and credibility of the gene prediction results (Fig. 2).Then, we functionally annotated the protein-coding genes based on five major databases, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam, Clusters of Orthologous Genes (COG) and the Carbohydrate-Active enZYmes (CAZy).More than half of the genes (58.42%, 10,272/17,581) were well annotated with at least one functional result (Fig. 3).

Methods
Sampling, extraction, and sequencing.F. intonsa populations were collected on pepper (Capsicum annuum L.) from Jiaxing (30.75°N, 120.79°E), Zhejiang, China, in 2017, and reared on fresh bean, Phaseolus vulgaris, in a climate-controlled chamber (25 ± 1 °C, 16 L).Approximately 200 adult F. intonsa samples from the laboratory population with mixed ages were decontaminated by immersion in 1% sodium hypochlorite solution (Gaide chemical, Hangzhou, Zhejiang, China) for 5 min, and followed by rinsing with sterile water and then immersion in 70% ethanol and finally rinsing with sterile water again.Samples were snap frozen in liquid nitrogen and stored at −80 °C.
Genomic DNA and mRNA were extracted and purified using QIAGEN DNA/RNA tissue kit (QIAGEN 69506/73404, Hilden, Germany), and prepared for sequencing libraries according to the manufacturer's instructions for sequencing technology (Nextomics Biosciences Co., Ltd, Wuhan, China).Long DNA fragments were sequenced on the Oxford Nanopore PromethION platform, and short-read sequencing, Hi-C sequencing and RNA-seq were performed on the Illumina NovaSeq.6000 platform (Table S1).

Technical Validation
Two different strategies were used to evaluate the completeness and accuracy of the F. intonsa genome.First, BUSCO analysis based on the Aukaryota, Metazoa, Arthropoda and Insecta (odb_10) datasets revealed that 96.9%, 98.2%, 98.8% and 98.3% of the core genes were successfully identified as complete.Second, we re-aligned the NGS, ONT and RNA-seq reads to the F. intonsa genome with the mapping rates of 92.80%, 90.48% and 88.63%, respectively.For evaluation of gene prediction completeness and accuracy, we performed BUSCO analysis based on the Eukaryota, Metazoa, Arthropoda and Insecta (odb_10) datasets, which resulted in 89.5%, 93.8%, 94.4% and 94.0% of completeness, respectively.Fig. 3 The number of genes annotated with COG, PFAM, GO, KEGG and CAZy databases.

Fig. 1
Fig. 1 Characteristics of the F. intonsa genome.(a) Estimation of genome size based on NGS reads using 21-kmer analysis.(b) The heatmap represents 15 pseudochromosomes of the F. intonsa genome.(c) The cirsos plot describes the genomic characteristics of F. intonsa, including the chromosomes (the outer circle), gene density (green cycle), repeat density (blue circle), and paralogous genes (line plot).(d) BUSCO analysis based on the Eukaryota, Metazoa, Arthropoda and Insecta (odb_10) data sets.

Fig. 2
Fig. 2 BUSCO analysis of the gene models based on the Eukaryota, Metazoa, Arthropoda and Insecta (odb_10) data sets.

Table 1 .
Statistics of F. intonsa reference genome.